Comparative analysis of bats and rodents’ genomes suggests a relation between non-LTR retrotransposons, cancer incidence, and ageing

The presence in nature of species showing drastic differences in lifespan and cancer incidence has recently increased the interest of the scientific community. In particular, the adaptations and the genomic features underlying the evolution of cancer-resistant and long-lived organisms have recently focused on transposable elements (TEs). In this study, we compared the content and dynamics of TE activity in the genomes of four rodent and six bat species exhibiting different lifespans and cancer susceptibility. Mouse, rat, and guinea pig genomes (short-lived and cancer-prone organisms) were compared with that of naked mole rat (Heterocephalus glaber) which is a cancer-resistant organism and the rodent with the longest lifespan. The long-lived bats of the genera Myotis, Rhinolophus, Pteropus and Rousettus were instead compared with Molossus molossus, which is one of the organisms with the shortest lifespan among the order Chiroptera. Despite previous hypotheses stating a substantial tolerance of TEs in bats, we found that long-lived bats and the naked mole rat share a marked decrease of non-LTR retrotransposons (LINEs and SINEs) accumulation in recent evolutionary times.

www.nature.com/scientificreports/ species that have not evolved biological adaptations to counteract cancer incidence as a function of body size (Peto's paradox) 26 . In fact, animals with a large body mass evolved mechanisms to contrast the development of tumours such as the control of the telomerase expression 27 , or the expansion in copy number of coding genes and miRNAs 28,29 . Small mammals have a smaller number of cells and generally shorter life cycles 30 , therefore they are disentangled from the Peto's paradox. For this reason, we selected mammals with a body mass lower than 2 kg to investigate the relationship between lifespan and TEs avoiding biases related to adaptations to large body masses. Thus, we tested the hypothesis that the genomes of cancer-prone and short-lived species present a higher load of recently inserted TEs compared to cancer-resistant and long-lived species.

Results
In this study we investigated the possible link between TEs, cancer incidence, and ageing by focusing on the accumulation patterns of TEs in genomes belonging to species showing different lifespans and cancer incidence (Fig. 1). We collected genome assemblies of similar high quality and generated a TE library (Data S1) for the species that previously lacked one to both maximise the presence of TEs in the assemblies and their correct annotation (Tables 1, 2). The de novo TE characterisation highlighted a higher percentage of TEs in the last genome assembly version of H. glaber based on PacBio technology compared to its first version based on short reads (Fig. 2).
Then, we proceeded to compare the accumulation profiles of TEs between long-(H. glaber) and short-lived rodents (M. musculus, R. norvegicus, C. porcellus) using the output of RepeatMasker (Data S2-S11). We observed a reduced accumulation of transposable elements in the naked mole rat (H. glaber) in recent times with respect to the short-lived rodents (Fig. 3).

Figure 1.
List of species analysed together with lifespan and body mass information. (a) Rodents with short lifespan and high cancer incidence are indicated in red while species with long lifespan and low cancer incidence are indicated in green. (b) Bats with short lifespan and low cancer incidence are indicated in orange while species with long lifespan and low cancer incidence are indicated in green. This information was retrieved from https:// genom ics. senes cence. info/ speci es/ index. html and Speakman et al. 31 . The same analysis was performed in bats. Bat genomes show a high accumulation of class II transposons in all species (Table 2) as well as a drop in the accumulation of non-LTR retrotransposons in recent times (Fig. 4).
The genomes of the analysed species are mainly composed of retrotransposons (SINEs, LINEs, LTRs). The main difference between the two orders is that all six bat genomes show a higher percentage of class II transposons (DNA transposons and Helitrons) compared to rodents, with a minimum percentage present in M. molossus (3.72%) and a maximum percentage in M. myotis and M. lucifugus (> 11%; Table 2).
Rodents, in general, show a limited amount of class II transposons with the maximum abundance present in H. glaber (2.3%). The genome of H. glaber has the largest genome size and its percentage of LINE retrotransposons is relatively high compared to the other rodents (Table 1). Among bats, M. myotis and M. lucifugus show a marked accumulation of Helitrons (6.93 and 6.18% respectively) in comparison with the other species that show an abundance of Helitrons below 1%. Notably, the short-lived M. molossus show the largest genome size and the highest percentage of non-LTR retrotransposons (SINEs and LINEs; Table 2). To investigate how the TE content of rodents and bats changed over time and detect the most recently active transposable elements, we used the RepeatMasker annotation (see "Methods" section) to generate a TE "genomic landscape" for each of the species analysed. The TE landscapes are a visualisation of the proportion of repeats in base pairs (Y-axis) at different levels of divergence (X-axis) calculated as a Kimura 2-p distance 31 between the insertions annotated and their respective consensus sequences (Figs. 3, 4). The TE landscapes in rodents are mostly dominated by retrotransposons and all of them show an ancestral peak of accumulation between 20 and 30% of divergence (Fig. 3). A small percentage of those ancient TEs is composed of relics of DNA transposons (blue in Fig. 3). The cancer-resistant rodent H. glaber show the highest accumulation of retrotransposons dominated by LINEs (green) between 5 and 10% of divergence followed by a dramatic drop corresponding to the most recent history of this genome (Fig. 3). On the other hand, the three cancer-prone rodents accumulated a large number of retrotransposons in recent times. M. musculus and R. norvegicus share a peak of accumulation between 15 and 20% of divergence, whereas R. norvegicus maintains a stable rate of retrotransposition of SINEs. In comparison to www.nature.com/scientificreports/   www.nature.com/scientificreports/ bats, rodents do not present high levels of class II transposons (Tables 1 and 2). The TE landscapes of the six bats ( Fig. 4) show a higher intra-order diversity in comparison with the four rodents that are homogenous (Fig. 3). In particular, the two megabats belonging to the Pteropodidae family (Rousettus aegyptiacus and Pteropus vampyrus) presented the lowest TE accumulation (  32 , are more heterogeneous at the level of TE accumulation. In fact, the two species belonging the Myotis genus share their highest peak of accumulation at 15% of divergence, whereas M. molossus has a second more recent taxonspecific peak at 8% of divergence. Despite the two Myotis species having the most similar landscapes (likely due to the closer phylogenetic relationship), M. lucifugus shows a higher accumulation and diversity of TEs in its recent history (0-3% divergence). On the other hand, M. myotis accumulated LTRs with a peak between 4 and 5% of divergence and, in general, the genus Myotis has the most pronounced accumulation of DNA transposons and Helitrons. Finally, we can affirm that in all the bat genomes here considered, the non-LTR retrotransposons (LINEs and SINEs) show a "glaber-like" dynamic with a decrease of accumulation in correspondence of their most recent evolutionary history (Fig. 4). Since we observed that long-lived species of both orders show little accumulation of non-LTR retrotransposons in recent times, we decided to compare in more detail their activity focusing on elements with a divergence lower than 3% (a proxy for recently inserted TEs) and their density of insertion (DI; Fig. 5).
The density of insertion was calculated as the ratio between the number of non-LTR retrotransposon (SINEs and LINEs) insertions and the assembly size in Gigabase (Fig. 5, Table S1). DI is a parameter used to compare the magnitude of the recent TE accumulation and its potential impact between species with different genome sizes 4 . When applying this measure in rodents, we found that the H. glaber showed the lowest DI (4,551) compared to the short-lived and cancer-prone species (Fig. 5). Then, we found lower DIs in bats than in rodents ( Fig. 5) except for M. molossus (8,269). The lowest DI (< 1000) was observed in R. aegyptiacus and P. vampyrus. The remaining species M. myotis, M. lucifugus and R. ferrumequinum have a DI ranging between 2000 and 5000. In general, the long-lived bats presented a lower DI compared to the short-lived M. molossus (Fig. 5). We also calculated the density of insertion of these retrotransposons per non-overlapping windows (0.5, 1, 1.5 Mb size) across the entire genomes (Figures S1-S3, Table S2). The resulting distribution of TE densities mimic the general DI.
We made a further detailed analysis of LINEs looking for potentially active autonomous non-LTR retrotransposons: we looked for the presence of LINE-related open reading frames (ORFs) with intact protein domains. By doing this, we found that all the analysed rodent genomes present intact LINE ORFs but the genome of H. glaber (Fig. 5). In bats, intact LINE ORFs were found only in M. myotis, M. lucifugus and R. ferrumequinum (Fig. 5). Moreover, we found that M. molossus, R aegyptiacus, and P. vampyrus genome have no intact ORFs. Then, we tested if short-and long-lived species statistically differ in terms of density of insertion. We pooled together the four rodents and six bats to apply the Wilcoxon signed rank paired test. Short-lived species were paired with species showing two-fold longer lifespan for a total of 25 comparisons (see "Methods" section). The test resulted in a p-value of 5.96 × 10 -8 (Table S3). Finally, we explored the possible correlation between the density of young non-LTR retrotransposons and gene, exon, intron, intergenic region densities but no strong correlation was found (Table S4). Similarly, we tested www.nature.com/scientificreports/ for the correlation between the TE density and gene density in gene-rich and gene-poor regions, but no general pattern was found (Table S5a-b).

Discussion
Transposable element activity and accumulation can have manifold effects on genomes and biological phenotypes. Multiple studies have linked TEs to ageing and the development of several diseases including cancer 14,[33][34][35][36] .
Here, we have studied the relationship between TEs and two different aspects of mammal life: longevity and cancer incidence. In rodents, the short lifespan is associated with the presence of cancer 37 . On the other hand, bats are considered cancer-resistant species 38 (Fig. 1b). The animals considered in this study are known to have different lifespans while sharing similar, small, body sizes (< 2 kg). H. glaber is the rodent with the longest lifespan known (31 years) and resistant to cancer while the other rodents show shorter lifespans of 12 (C. porcellus), 4 (M. musculus) and 3.8 (R. norvegicus) years. Chiroptera species, as far as it is currently known, are long-lived species and resistant to cancer (only a few confirmed cases have been reported 39 ). Since TEs have been extensively linked to the development of cancer, we investigated the TE content of bats and rodents to see if it is possible to find shared features between long-lived bat and rodent species.
To analyse the TE content of these organisms, we relied on the use of high-quality genome assemblies based on long reads that better represent the actual genomic repetitive content with respect to assemblies based on short reads 1 . The use of long read-based genome assemblies is particularly important when analysing young TE insertions (as in this study) that are highly homogeneous and tend to be underrepresented in short read assemblies 2,40 . On top of that, the combination of long read assemblies and custom TE libraries maximises the representation and annotation of the transposable element content as highlighted here for the genome of H. glaber (Fig. 2). The first TE annotation of H. glaber without a custom TE library showed a repetitive content of about 25% 41 but the use of a proper TE library increased the content to ~ 34% (Fig. 2). Finally, the use of long reads increased the total content of H. glaber TEs to 38% (Fig. 2, Table 1).
By analysing the TE annotations, we found that the main difference between short-and long-lived species of rodents is represented by a drop in non-LTR retrotransposon accumulation at recent times (0-5% divergence; Fig. 3). Similarly, the long-lived species of bats showed a drop in non-LTR retrotransposon accumulation at recent times ( Fig. 4) while presenting an overall accumulation of class II transposons (DNA transposons and Helitrons). Previous studies hypothesised that bats have a higher tolerance for the activity of transposable elements with alternative ways to dampen potential health issues due to this activity 38,42 . Given our observation on the shared drop of non-LTR retrotransposons accumulation in bats and in H. glaber, we add to the aforementioned hypothesis, that the specific repression of non-LTR retrotransposon activity may enhance cancer resistance. In fact, the non-LTR retrotransposons are the most prevalent types of TEs in rodents (Fig. 3, Table 1) and the most extensively investigated by biomedical research given that they are the only active TEs in the human genome 43 .
Since the total landscapes of TEs include remote evolutionary dynamics which are unlikely to be associated with the genetics and physiology of cancer and aging, we compared the most recent accumulation of non-LTR retrotransposons between the long-lived and short-lived species considered in this study using the density of recent insertions (0-3% of divergence from consensus). As expected, cancer-prone species present a higher load of recently inserted non-LTR retroelements than cancer-resistant species (Fig. 5). While lifespan of rodents showed a strong relation with the recent activity of non-LTR retrotransposons (Fig. 5), the same type of relation for bats is less clear. However, based on DI comparisons, we noticed that the long-lived bats show a DI similar to the sole long-lived cancer-resistant rodent (H. glaber), while the sole short-lived bat (M. molossus) shows a DI value more similar to short-lived rodents. Finally, by comparing species with large differences in their lifespans, we reject the hypothesis of independency of non-LTR retrotransposons accumulation with ageing in small mammals (Wilcoxon test, p-value: 5.96 × 10 -8 ).
Given the pattern of DI observed in rodents, we speculate that the recent accumulation of non-LTRs may be related to the lifespan of these species (Fig. 1a, 5, 6a). On top of that, we also found that the genome of H. glaber does not contain intact ORFs of LINEs, which is another clue for the absence of currently active non-LTR retrotransposons (Fig. 5). The absence of intact LINE ORFs in the long read genome assembly confirms the same observation previously reported on the short read based genome assembly 44 . In bats, we observed that M. molossus (shortest lifespan in bats), showed the highest DI, and no intact LINE ORFs (Fig. 5) but has a very high percentage of non-autonomous SINEs compared to long-lived bats ( Table 2). SINEs are about 300 bp long and do not present any protein coding sequence, which makes them unable to retrotranspose autonomously: SINEs rely on the exploitation of the retrotransposition machinery provided by LINEs to move throughout the genome (Fig. 6a).
Given that the M. molossus genome lacks intact LINE ORFs, it is reasonable to assume that both LINEs and SINEs are currently inactive in this species (Fig. 6b). This begs the question of why M. molossus has a short lifespan. The density of recently inserted non-LTR elements in M. molossus is 8269 per Gb, which is comparable to the value in Cavia porcellus (9,163 insertions per Gb). In contrast, the long-lived rodent Heterocephalus glaber has a DI of only 4,550 insertions per Gb, while the long-lived Myotis lucifugus has 4,531 insertions per Gb. These findings suggest that future studies on bats should consider two phenomena: first, the potential impact of the accumulation of LINE and SINE insertions on age-related conditions, and second, the cytotoxic effects of an excess of transcribed RNA from these retrotransposons.
A large number of very similar insertions across the chromosomes may cause structural polymorphisms (e.g., through non-allelic homologous recombination events) independently from the retrotransposition itself 45 . For example, in the recent history of the human genome, LINE1-driven ectopic recombination events caused genomic rearrangements that can be responsible for several diseases 46 . Moreover, it has been observed in the African killifishes that an accumulation of detrimental polymorphisms following a burst of TEs can undergo phases of www.nature.com/scientificreports/ relaxed selection were the mutations eventually promote age-associated diseases and reduced lifespan 47 . Among the killifishes, the turquoise killifish has a lifespan of 4-8 months (likely the shortest lifespan among vertebrates 47 ) but despite the accumulation of mutations that leads to a fast ageing, this species is fully adapted to its ecological niche demonstrating that a rapid ageing is not necessarily linked to reduced fitness (Fig. 6b). In addition, SINEs can also exert physiological effects by being transcribed. For example, in humans, SINEs of the Alu family have been found to be involved in the inflammatory pathway: upon viral infection, the interferon 1 mediates the overexpression of Alu sequences which function as an amplifier of the immune response by stimulating the intracellular viral RNA sensor systems 48 . Despite the involvement of SINEs in the human immune response, the same upregulation of Alu elements, or the excessive sensitivity of the viral RNA sensors, has been linked to autoimmune diseases 49 . Furthermore, as non-coding RNAs are emerging as essential components of neuroinflammation, a recent study has included Alu SINEs as possible promoters of neurodegenerative diseases 50 . More in general, the increased transcription of non-LTR retrotransposons (both LINEs and SINEs) in humans may contribute to so-called "sterile inflammation" (Fig. 6b), a phenomenon for which a chronic state of inflammation is triggered without the presence of any obvious pathogen and that is exacerbated with age ("inflammaging") 51 . Therefore, we speculate that the marked accumulation of SINEs in M. molossus might trigger phenomena similar to the sterile inflammation and inflammaging that can cause a shortening of its lifespan. For these reasons M. molossus, together with the well-known H. glaber, may be an exceptional model for biogerontology. Further bioinformatic research can be developed in this respect and may lead to a more precise understanding of the regulation of TEs in the study of longevity and oncology and potential new biomedical applications.

Methods
Samples. In order to avoid biases related to potential underestimations of the TE content, in this study we chose those species of mammals from which high-quality genome assemblies were available. The genome assemblies for the 10 53 . The species selected belong to Rodentia and Chiroptera orders. Their most recent common ancestor dates back to the Mesozoic era 90 Mya ago 56 . The large evolutionary distance among these two orders allowed us to find genomic features potentially conserved in all mammals. Body mass is generally positively related to the lifespan 30 therefore, in our study, we chose to analyse genomes belonging to small-sized mammals (between 12 g and 1.1 kg). The comparisons were carried out using species belonging to  (Fig. 4), that may have resulted in the accumulation of ageing enhancing mutations, and the continuous transcription of these retroelements may trigger cellular inflammation and cause a widespread sterile inflammation. In this scenario, the fitness of M. molossus is not impaired but its lifespan is reduced. www.nature.com/scientificreports/ the same taxonomical order to maximize their shared evolutionary history and life history traits. These species were also selected and compared based on their lifespans (Fig. 1). The genome assemblies analysed are all based on Sanger or PacBio technologies and selected to maximize their assembled repetitive content 1 . The short 41 (GCA_000230445.1) and long read-based 54 genome assemblies of H. glaber were compared in order to estimate the differences in TE content due to technological biases.

Repetitive element libraries and annotation.
To analyze the content of transposable elements in genomes, the genome assemblies must be annotated with proper libraries of TE consensus sequences. A consensus sequence represents the reference sequence of a specific subfamily of transposable elements. The repeat annotation software (e.g., RepeatMasker) uses such libraries of consensus sequences to find instances (or insertions) of each repetitive element within the genome assemblies. From the alignments of the TE insertions found in the genome assembly and their consensus sequences, it is possible to estimate the age of such insertions based on the genetic distance given by the alignments: the higher the distance, the higher the age of the insertions. All the species analysed have a species-specific repeat library already available except for H. glaber. Therefore, we made a de novo repeat library of H. glaber using RepeatModeler2 with the option -LTRstruct 57 . The resulting consensus sequences were merged with the ancient mammalian TE families (LINE2 and MIR SINEs) downloaded from RepBase (https:// www. girin st. org/ repba se). All the genome assemblies of rodents (but H. glaber) were masked using RepeatMasker (v. 4.1.0) 58 and the Rodentia-specific library (Repbase release 20181026). Similarly, the bat genome assemblies were masked using the Chiroptera-specific library from Repbase merged with the curated TE libraries produced by Jebb et al. 55 . The RepeatMasker annotations were performed using the options -a -xsmall -gccalc -excln. The annotations were then visualized as "genomic landscapes": barplots that show in the X-axis the genetic distances from the consensus sequences (calculated as Kimura 2-p distance 31 ) and in Y-axis the percentage of the genome occupied by each TE category (Figs. 3, 4). The barplots were made using the ggplot2 R package. For each species, the genome assembly size and the percentage of each TE categories (Tables 1, 2, Fig. 2) were retrieved from the table files produced by RepeatMasker.

Density of young non-LTR retrotransposons.
To evaluate the recent activity of non-LTR retrotransposons (LINEs and SINEs), we selected all the non-LTR retrotransposon hits found by RepeatMasker with a divergence from consensus lower than 3%. The 3% threshold is a proxy for most recently inserted elements 3,4 . To minimise the overestimation of insertions due to the fragmentation of the repeat annotation, we controlled for fragments belonging to the same insertion as RepeatMasker assign them the same identifier. Then we calculated the density of insertion (DI) for each species as the ratio between the number of recent insertions and the corresponding assembly size expressed in Gigabase: We expect that the long-lived species are associated with lower DI with respect to lower lived species. To test this hypothesis, we pooled together bats and rodents creating all possible comparisons of long-lived with short-lived species. Each pair of species were chosen as follows: long-lived species lifespan must be at least two-fold longer than the second species' lifespan; for example: H. glaber (31 years) versus M. molossus (5.6 years). A total of 25 pairs of species were then compared (Table S3). To test this association, we chose Wilcoxon signed rank paired test (one tailed; alpha: 0.05) since the data are not normally distributed. We also calculated the density of the number of young non-LTR retrotransposon insertions across the genomes in non-overlapping windows of 0.5, 1 and 1.5 Mb, to then calculate the mean density and standard deviation.

Density of young non-LTR retrotransposons and their correlation to genomic features.
To understand if the densities of insertions found in these genomes correlate with the density of genes, exons, introns, and intergenic regions, we collected all the gene annotations available for the species of this study. We found available on NCBI the gene annotations for all the genome assemblies except for Heterocephalus glaber, Pteropus vampyrus and Myotis lucifugus. We then calculated the density of these genomic features for non-overlapping windows of 0.5, 1 and 1.5 Mb as the number of features per unit of window size. To test the correlation between the TE density and the other genomic features, we used the Spearman rank correlation test.
In addition, we tested if there was a correlation between the TE density and gene density in gene-rich and gene-poor regions (windows) specifically. We categorized gene-rich and gene-poor windows based on their gene density distribution. The windows showing a gene density value less or equal to the 0.25 percentile of the distribution were categorized as gene poor. The windows with a gene density value greater or equal to the 0.75 percentile of the gene density distribution were categorized as gene rich. The correlation between the TE and gene densities were tested with the Spearman rank correlation test. Finally, we tested if the TE density in gene-rich and gene-poor regions (windows) statistically differed by using the Wilcoxon rank sum test.

Intact ORFs detection.
To check the presence of intact LINE retrotransposons (therefore potentially active), we looked for complete open reading frames (ORFs) that encode for the enzymatic machinery used by LINEs to retrotranspose themselves and the non-autonomous SINE elements. In each species the sequences of LINEs were obtained with BEDTools getfasta command 59 using the LINE coordinates annotated by Repeat-Masker. A fasta file with the LINE sequences for each species was produced and used as input for the R script orfCheker.R (https:// github. com/ james dgalb raith/ Ortho logue Regio ns/ blob/ master/ orfCh ecker.R) to find intact www.nature.com/scientificreports/